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Abstract 

Random finite sets (RFSs) has been a fruitful area of research in recent 
years, yielding new approximate filters such as the probability hypothesis 
density (PHD), cardinalised PHD (CPHD), and multiple target multi- 
Bernoulli (MeMBer). These new methods have largely been based on 
approximations that side-step the need for measurement-to-track associa- 
tion. Comparably, RFS methods that incorporate data association, such 
as Morelande and Challa's (M-C) method, have received little attention. 
This paper provides a RFS algorithm that incorporates data association 
similarly to the M-C method, but retains computational tractability via a 
recently developed approximation of marginal association weights. We de- 
scribe an efficient method for resolving the track coalescence phenomenon 
which is problematic for joint probabilistic data association (JPDA) and 
related methods (including M-C). The method utilises a network flow op- 
timisation, and thus is tractable for large numbers of targets. Finally, 
our derivation also shows that it is natural for the multi-target density to 
incorporate both a Poisson point process (PPP) component (representing 
targets that have never been detected) and a multi-Bernoulli component 
(representing targets under track). We describe a method of recycling, 
in which tracks with a low probability existence are transferred from the 
multi-Bernoulli component to the PPP component, effectively yielding a 
hybrid of M-C and PHD. 



1 Introduction 

Recently much work has been devoted to RFS-based approximations such as 
the [C]PHD and MeMBer filters [I]. A key characteristic of these methods has 
been their tractability, which is the consequence of clever approximations that 
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avoid data association. Algorithms such as M-C [2] have received little attention 
due to the computational cost associated with calculating marginal association 
probabilities, i.e., similar to those used in joint probabilistic data association 
(JPDA) |3]. This paper develops a tractable version of the M-C algorithm 
based on a recently-developed, high quality, tractable approximation to these 
marginal probabilities |4|5| . Repeating the derivation of M-C, we obtain a form 
that permits application of this approximation, and show that it is natural 
for the multiple target distribution to contain both a component representing 
targets that have been detected, and a PPP distribution representing targets 
that so- far remain undetected, as previously suggested in [6]. 

Section [2] provides this derivation of prediction and update steps for the 
full Baycs multi-target distribution. Naturally this is intractable; difficulties in 
implementation occur in four areas: 

1. Dealing with the exponential growth of hypotheses within each track 

2. Appropriately representing the PPP distribution of undetected targets 

3. Dealing with the growth in the number of tracks as new measurements 
are received 

4. Performing inference on the joint target distribution (i.e., data associa- 
tion) 

The first difficulty has been well-examined in previous works such as [7] and [8] . 
In this paper, we argue that the second can be addressed via a grid filter using a 
relatively coarse discretisation (since the primary motivation is to represent the 
intensity of targets that have never been detected). In Section [6j we show that 
this solution can then aid in resolution of the third difficulty through recycling, 
which represents tentative tracks (those with low probability of existence) via 
the PPP distribution. Finally, we show in Section [3] that our previous results on 
LBP data association in 4[5 can be leveraged to address the fourth difficulty. 



We refer to the resulting algorithm as the Belief Propagation Marginal Track 
Filter (BPMTF). 

The BPMTF is effectively a hybrid of a PHD representing low confidence 
targets, and a MeMBer distribution representing high confidence targets. Our 
derivation also admits an alternative algorithm that maintains several scans of 
history, and thus the prior for the existing tracks at each scan is not multi- 
Bernoulli. Again we rely on BP to calculate marginal association weights; in 
this case, the weights calculated are for association history events, rather than 
just association events in the latest scan, similarly to multiple scan JPDA [9]. 
We refer to this as the multiple scan BP filter (MSBPF). In contrast to multi- 
ple dimensional assignment (MDA) [lO] , which calculates a multiple scan MAP 
association, the MSBPF approximately calculates multi-scan marginal proba- 
bilities. While maintaining inter-target dependence in the association history 
is essential in hard association systems (e.g., MDA), typically less benefit is 
observed in soft association systems such as the one studied here (i.e., systems 
which calculate marginal weights). Therefore, we expect that the performance 
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gain of the MSBPF over the BPMTF may not be large, such that the lower 
complexity BPMTF may ultimately be the preferred method. Some initial re- 
sults of this comparison were provided in |4j ; details of the MSBPF and further 
experiments will be the subject of a later publication. 

Finally, in Section [4] we describe a method for addressing the problem of 
coalescence, which affects most algorithms that calculate marginal association 
weights, including JPDA and M-C. The proposed algorithm is based on efficient 
network flow optimisation methods 111], and thus is tractable for problems in- 
volving large numbers of targets. 

In our development we neglect implementation issues such as gating, clus- 
tering and mixture reduction. Any practical implementation would obviously 
include these concepts; we leave these details to the reader. 

2 Random set filter derivation 

The derivation that follows largely parallels [2 , making extensive use of models, 
methods and results in 11. The assumptions in our development include: 

• Targets arrive at time t according to a non-homogeneous PPP with inten- 
sity A b (a;) (let A b = J X h (x)dx), independent of existing targets 

• Targets depart according to independent, identically distributed (iid) 
Markovian processes; the survival probability in state x at time t is P s {x) 

• Motion for each target is governed by a Markovian process, independent of 
all other targets; the single-target transition probability density function 
(PDF) is f t +i\t(x\x') 

• Each target may give rise to at most one measurement; the probability of 
detection in state x at time t is P d (x) 

• Each measurement is the result of at most one target 

• False alarms arrive at time t according to a non-homogeneous PPP with 
intensity A fa (z), independent of targets and target-related measurements 

• Each target-derived measurement is independent of all other targets and 
measurements conditioned its corresponding target; the single target mea- 
surement likelihood is f(z\x) 

We denote by Z t = {z 1 , . . . , z mt } the measurement set at time t, and by 
Z = {Zi, . . . , Z t } the measurement history up to and including time t. The 
development does not change if the birth density, survival probability, detection 
probability, false alarm density and measurement likelihood are time varying 
{e.g., depending on a known, time- varying sensor state); we omit the time in- 
dex from these parameters for notational simplicity. 

The form which we assume for the multiple target probability distribution 
at time t incorporates the following elements: 
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A PPP intensity of undetected targets, \%(x) 

A set of resource^] TZt, the elements of which are of the form (t,j), where 
j G { 1 , . . . , m t } is an index of a measurement in scan t 

A series of Bernoulli tracks, i G 71 = {1, • • • , n t }, each incorporating: 

— A set of hypotheses W t , the elements of which denote resource con- 
sumption^ (for each a G %\ we will have a QlZ t ) 

— For each a G %\, a hypothesis weight 

— For each a G 1-L\, a hypothesis-conditioned Bernoulli multi-target 
distributiorQ 

4 a pQ = tftfftw, * = { X } 

[O, \X\>2 

where is the probability of existence under the hypothesis, and 
fl^{x) is the existence-conditioned PDF 



Following i[5 , we also define resource compatibility tables ipp^a, b) G {0,1} 



for all i G 7t, j G 7vL t , a G W t , b G 7t as: 

^ J ' b = i, j gaovb^i, j €a 
i>r{a,b) = \ . (1) 

I 1, otherwise 

These functions are used within the graphical models formalism to encode the 
constraints that measurements can be used at most once in any association event 
(see Sections 2.2 and [3j and |4|5| ). Loosely, they enforce global consistency 



by maintaining two redundant representations of the hypothesis (variables a 
comprised of elements indicating the association hypothesis for each target, and 
variables b comprised of elements indicating the association hypothesis for each 
measurement), and ensuring consistency between them. 

The form of the distribution of targets that we study is of the union of 
two independent components: f^ t (X) representing targets that so-far remain 
undetectecj^jand f^ t (X) representing targets that have been previously detected 
and are thus under track. From these components we can reconstitute the full 
distribution as: 

V(A)= £ f^,{Y)f tW {X-Y) (2) 

YCX 



1 We use the term resources in the traditional operations research sense, i.e., abstract 
quantities that are shared between tasks in an optimisation. In our case, resources corre- 
spond to measurements and tasks correspond to tracks, thus encoding the notion that each 
measurement may be used at most once in any association event. 

2 We leave off conditioning on the measurement set history Z' = (Zi, . . . , Zt) throughout, 
as this is implicit in the second subscript / t | t . 

3 i.e., they have never previously been detected; this meaning will subsequently be altered 
in Section |6] 
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where t' = t — 1 for the one-step prediction, and t' = t for the updated distribu- 
tion. We digress momentarily to discuss the concept of maintaining a distribu- 
tion of targets that have never been detected. While this may be troubling to 
some tracking practitioners, it has been suggested previously in [6j, and studied 
extensively in search theory [12 ^] The reason for its inclusion is that when new 



targets are born, they are not necessarily detected in the first instance. There- 
fore, there is some probability that they will move (and potentially die) prior to 
being detected, which may occur several time steps later. If the initial intensity 
function of undetected targets, the birth rate of new targets, the probability 
of target death and the probability of detection are all spatially homogeneous 
within the region of interest, the intensity function of undetected targets will 
remain homogeneous through time and will converge to a constant value. How- 
ever, realistically these quantities are non-homogeneous and time-varying for 
reasons including: 

• Spatially non-homogencous arrival rates of new targets, e.g., at airports 
and at the boundary of the surveillance region (including due to sensor 
platform motion) 

• Non-homogeneous target death probability (for similar reasons) 

• Inherently non-homogenous, time-varying sensor detection performance 
(e.g., performance variation with range, moving sensors, and agile sensors 
such as electronically scanned array radar) 

Even in the case in which all these quantities are time-stationary, maintenance 
of a time- varying distribution of undetected targets can speed track initiation at 
system start-up, i.e., as the system transitions from the steady state condition 
in which sensors are not functioning (where there are many undetected targets 
and no targets under track), to the steady state condition in which sensors are 
functioning (with few undetected targets and most targets under track). As- 
suming a non-unity probability of detection, many tracks will take several time 
steps to be detected during this transient; the undetected target distribution 
follows the expected distribution of newly detected targets over time to pro- 
vide a posterior probability that each measurement is a newly detected target, 
incorporating knowledge of the history of sensor operation. 

Returning to our derivation, the PPP representing undetected targets is: 



f? [t (X) cx J] \? [t (x) (3) 
The component representing previously detected targets is: 

f* t ({ X1 ,..., Xn }) *y,t[ (utftif&a®) n ] w 

a,a,bieTt \ j&TZt j 



4 Our work in this area commenced with |l3| , which uses a PPP distribution of undetected 
targets to tractably solve non-myopic dwell time optimisation problems for new target search 
with an agile beam radar. 
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where 

Y a J {x a (i)}, a(i) > 
Q(l) ~\0, a(i)=0 

The sum in Q is over all functions a : 7t — > {0, ...,«} such that {1, . . . , n} C 
Qt(7~t) and if > 0,i ^ j then 7^ a(j'Pm.e., there is exactly one track 
mapped to each Xi 7 i 6 {l,...,n}), and over the Cartesian products 

ae l[nt, be [] 71 

i£T t jen t 

The resource tables ^r J ('> ') m @ represent data association, i.e., the constraint 
that two tracks cannot utilise the same measurement under any given event. 
The form may appear to be mysterious now; it will be explained further in 
Sections EH1 and ill 

Finally, note that f t \t can be written in probability generating functional 
(PGF1) [T| p372] form as 

G t]t [h]=G^[h]-G p t]t [h] 

where [l] p373] 

G% t [h] ex exp{A^]}, \% t [h] = J X% t (x)h(x)dx 

and G^ t [h] is the PGF1 of f^ t (X). Since, conditioned on a joint hypothesis 
a = (ai, . . . , a„ t ), this distribution is multi-target multi-Bernoulli, we can write 
this via linearity of the PGF1 (see Appendix [A| as: (T[ p374] 

°t\M = e n ("tfetfw n ( 5 ) 

a,b i£% \ jelZt J 

where = 1 — q l t u + <Z t 'j" / h(x)f^(x)dx is the PGF1 of f^(X). 

2.1 Prediction step 

The dynamics model described in the assumptions at the beginning of Section [2] 
corresponds to the case IV model in 111 p474]: 

G t+1]t [h\X'] cx (1 - P s + P s Ph ) x ' ■ exp{A>]} 

where, in accordance with notation from [l], 

(1 - P s + P s Ph ) x ' = 11 [l-P s (x') + P s (x , ) Ph (x')} 
x'ex' 

Ph(x') = / f t+1 \ t (x\a/)h(x)6x 



Note that if n > nt = \7t \ then there is no such function, so the sum will be zero. 
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= exp{A>]} ■ GUI ~P S + P s Ph ] ■ GUI -P s + P s Ph ] (6) 



Then by [T] p529], 

G t+nt [h] ex exp{A>]} ■ G t]t [l -P s + P s Ph ] 

% {t [l-I» + P>p h ]-„ Mt 

The leading terms correspond directly to the prediction step in the PHD [14] : 

G» +1 \ t [h] cx exp{A>]} • GIj t [l - P s + P s Ph ] 
cx exp{A>] + A^[l - P s + P s Ph }} 
^ exp{X b [h] + X^ t [P s Pfl }} 
= exp{X? +1[t [h]} 

where X% + Uh] = X h [h] + XUP s p h ], so that 

X u t+l]t {x) = X h (x) + J f t+llt (x\x')P s (x')Xr it (x')dx' (7) 

Consider now the term in ^ representing previously detected targets, 
G p t+Mt [h] = GUI - P s + P s p h }. Through the expression for the PGF1 
of GjiJ/i] in (|5j) (which observes that, conditioned on a joint association 
hypothesis, the multi-target density is multi-Bernoulli), this can be calculated 
using the prediction step of the MeMBerj^] [lj p675] 

G p t+llt [h}±G p tlt [l-PUP s Ph ] 

= £11 [v>$G>£[l-I» + P'PH] II ^ j (a\V) 

a,b ieTt \ J&Tlt 

=En n ) (8) 

a,b ieTt \ j£K t J 

where 

G- 1]t [h]-G-[l-PUP s Ph ] 

= l- g ^ + q ^J\l-P s (x')]f; i :(x')dx'+ 

+ ql\t J J h[x)f t+1 \ t {x\x')dxP a {x l )f t ^(x l )dx' 

= 1 - Qt+i\t + i+i\t J h ( x )ft+i\ t ( x ) dx 

6 With target death but without target birth, as this is handled through the separate 
Poisson component G" +1 |J/i]. 
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where the final equality is made by defining: 



f t ' +1 \ t ( x ) 



fp°(x>)f t ::(x>)dx> 



To complete the definitions, we set ^+11* = w t\t> an< ^ /t+iltC^") ^° ^ e ^ e 
multi-target distribution corresponding to the PGFl G^jiJ/i]: 

ft+i\t( x ) = ^ ffJ+iit/t+iitC*)' x = M 
(o, \X\>2 

Consequently, the prediction step on the distribution of previously detected 
targets is implemented simply by processing each track separately with the 
above expressions. 

2.2 Measurement update step 

The measurement model described in the assumptions at the beginning of Sec- 
tion [2] corresponds to the case V model in [TJ p422]: 

G t+1 [g\X] cx exp{A fa [ 3 ]}(l - P d + P d Pg ) x 

where 

(1 - P d + P d Pg ) x = J] I 1 - pd ( x ) + pd ( x )Pg( x )] 

Pg( x ) - J f( z \ x )g( x )dx 

Subsequently, the joint PGFl of measurements and targets can be written as: [l] 
p531] 

F[g, h] cx exp{A fa [.g]} • G t+1{t [h(l - P d + P d P a )] 

= exp{A fa [g]} • G« +1 |JM1 ~ ^ d + P d P g )\ • G p t+1]t [h{\ - P d + P d p g )] 
(x exp{A fa [g] + A? +1 , t [ft(l - P d + P%)}} ■ G^J^l - P d + P%)} 
= expW+nMl ~ P d )]} ■ exp{A fa [. 9 ] + X? +llt [hP d Pg ]}- 

■G p t+llt [h(l~P d + P d Pg )} (9) 

By p530], 



8F 

G t+1]t+1 [h] cx — — [g,h] 



g=0 
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Since the first term in (|9| does not involve measurements (i.e., the measurement 
functional g does not appear), we can separate this into terms 

Gt+i\t+i[h] = G" +1 | f+1 [/i] • G p t+1 \ t+1 [h] 

where the distribution of undetected targets is equivalent to a PHD update in 
the absence of measurements: fl4l 



G? +1|4+ J/i]c<exp{A? +1|t [/i(l-P d )]} 
= exp{A" + i| t+1 [/i]} 



(10) 



We now turn to the distribution of detected targets (both those that have been 
detected previously, and those that are detected for the first time at (t + 1)): 



G p t+Mt+1 [h] ex — (exp{A ta [ 3 ] + X<t +llt [hP d Pg }} ■ G p t+Mt [h(l - P d + P d Pg )} 



3=0 



This can be evaluated via the chain rule: u\ p389] 



5Y 



(F 1 [h]---F n [h]) = Yl 



W 1 it)--itlW rl =Y 



8F X 
SWi 



[h] 



SF n 



[h] 



(11) 



where the notation U denotes that the sum is over all disjoint sets W\, . . . , W n 
such that W\ U • • • U W n =Y. To begin, we consider the components required 
to evaluate (111. It can be easily shown that 

5 



5Z 



ex P{%]} 



= n a w 

9=0 zez 



Consequently, 
6 



SZ 



exp{A ta [ ff ]+A;V 1|t [^P%]} 



3=0 



= II (^(z) + K +llt [hP d f(z\-y_ 



zez 



n w t+i\t+i 



zez 



(1 - 9t+i|t+i) + «t+i|t+i / ft+i\t+i( x ) h ( x ) dx 



} I w i+i\t+iG Z t + i\t+i\h] 



(12) 



zez 
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where 



\? +Mt {hP d f(z\-)} 4 j h(x)f(z\x)P d (x)X^ +1{t (x)dx 

< 1 | t+1 =A fa (z) + A t " +1|t [/(z|-)P d ] 

A t " +1| J/(z|-)P d ] 
%+i| t+ i A ^(z) + A- +1| J/(z|.)P d ] 
/(z|x)P d (a:)A- +1|t (x) 



(13) 



- A « +1|t [/(,|.)Pd] 

^t+llmM = (1 ~ 9t+l|i+l) + Q't+l|t+l/t Z +l|t+l[^] 

, Gj +1 | t [/i] can be decomposed into a sum of multi-Bernoulli 
lence G p t+1 ^ t+1 [h] will be composed of many Bernoulli updates of 



As shown in 
distributions, 
the form 



w t+i\t+i Lr t+i\t+n n \ ~ —w 



SZ 



G^ llt {h(l-P d + P d P 9 )} 



9=0 



1 - £2m + ql'l x uf t ? m [h{l - P A ) 



^ t \^il lt \hP d f(z\-) 
0, 



so that, in the case where Z 

G i,a ' 



we find 



t+\\t+i 

i,aM 



[■7 1 2,a,w . z,a,w /*i,a,w rn 

~ 1 — "t+l[t+l + "t+l|*+l-'t+l|t+lL J 

ai,a | i, a j*i,a r-, r.d"l\ 



A 2, a 
W t+l\t+l ~ W t+l\t 



t+i\t ~ ^t+i^J t+i\t y 

t+l\tft+l\ti 

i,a ri.a 



q^uJtUll ~ P d \ 



t+l|t+l ^"^^ — 



and when Z — {z}, we find 



t+l|t+l i i,o r „*,<» f^ a ri pdl 

1 «i+i|t "T" %+\\th+\\t\- y r \ 
. \l-P A (x)]f^ t {x) 



flUA^-p"] 



r ,i,a,{z} r. l _ I _ i.a,{z} i,a,{z} ri,a,{z} r, i 



^t+l|*+l 
i,a,{z} A -, 



^< 1|t Ci|^|J^|-)^ d ] 

A i 

f+l|t+l 



•u+ilt+iW ~~ 



/(z|x)P d (x)/; ; a 1|t (x) 



/;; a 1|t [/(^l-)^ d ] 



Z = 
Z = {z} 
|Z|>2 
(14) 



(15) 



(16) 
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In order to apply the chain rule (fTTh, we seek a convenient form for a summa- 
tion over all subsets of Z t +\. Lemma 111 shows that this can be achieved using a 
form similar to that in Q . The motivation for using this form will be discussed 
in Section [3l 

Lemma 1. Suppose we are given set functions Fi(Z), i £ {0, ...,n}, such that 
F (Z) = Hzezfo(z) and if i > 1 and \Z\ > 2 then Fi(Z) = 0. If Z = 
{z\...,z m }, then 

J2 F (W ) ■ ■ ■ F n (W n ) 
w w---t±>w n =z 

(\ n / m 

J I faJ =0 / i=l \ j=l 

where the sum over (a, b) is over all a = (a 1 ,..., a") such that 
a 1 e {0, . . . , m} V i and all b = (&,..., b rn ) such that V 6 {0, . . . , n} V j, and 



Z a == 



0, a 1 = 
z a * , a 1 > 

0, V — i,a l 7^ j or a 2 = j, o- 7 7^ i 

1, otherwise 



Proof. We show that there is a one-to-one equivalence between non-zero terms 
in the LHS and RHS expressions. First, consider a non-zero LHS term, i.e., a 
choice of (Wo, • ■ • , W n ) (with Wi mutually disjoint and Wo U • • • U W n = Z) for 
which F (W ) • • • F n (W n ) 7^ 0. For each i 6 {1,...,»} we either have W — 
{z a } for some a 1 6 {1, . . . , m}, or Wj = or in which case let a 1 — 0. Then 
Wi = Z a% V i e {1, . . . ,n}. For each j e {1, . . . ,m} let = {i|a* = j}. If 
£ 3 = then set fr? = 0. If ^ = {i} then set fo? = i. Note that |#?| < 1 as 
Wi = Z a ' and the sets W t are disjoint. Then since W = Z\{W X U • • • U W n ), 
we must have Wo = {z'\V = 0}. By construction, tpp^a 1 ,^) = 1 V 
hence there is a RHS term that is equal to the LHS term under consideration. 
It is clear from the construction that any change to (Wo, . . . , W n ) will produce 
a different (a, 6), so the mapping is injective. 

Now consider a non-zero RHS term, i.e., a choice of a = - (a , . . . ,a n ) and 
6 = (6 1 , . . . ,o m ) such that ffi (<!*,&) = 1 V Let W = {z s \W = 0}, and 

Wi = Z a for i G {1, . . . , n}. Seeking a contradiction, we suppose that the sets 
(W , . . . , W n ) are not disjoint. If W H W 7^ then we have fyi(a\V>) = for 
j = a 1 (since V = 0) which is a contradiction. Similarly, if W,' n Wi 7^ 0, i' 7^ i, 
then for j — a 1 , we must have either tpp^ (a 1 , b>) =0 or J (a l , 6 J ) = 0, which 
is a contradiction. Thus (Wo, . . . , W n ) are disjoint, and the LHS term involving 
(Wo, . . . , W n ) is equal to the RHS term under consideration. 

Finally, suppose that (a, b) and (a, b) both map to (Wo, . . . , W n ). Then we 
must have a = a and = 0} = = 0}. Suppose b 3 ' — i 7^ V . Then since 
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a 1 = a 1 , either ipp 3 (a 1 ,V) = or x/jp 3 (a 1 — 0, hence one of these two terms 
is zero. □ 



The following corollary is a direct consequence of the chain rule of (11), 
followed by an application of Lemma [T] 

Corollary 1. If the a priori distribution of pre-existing targets is 
multi-Bernoulli, i.e., 

G t + i\ t w = n w Ui\t G Ux\M 

ieTt 

where G l t+1 ^ t [h] (i G T t ) is a Bernoulli distribution (initially assume w l t+1 ^ t — 
1, but we retain the weights to permit different values), then the distribution 
updated with Z t +\ = {z , . . . , z mt+1 } is 



a,b \ j\bi=0 



m t+1 



•II hm|t +1 G m| t+1 W U^^M) 

i=l \ J'=l 

Since the predicted distribution in (J8| is a mixture (i.e., linear combination) 
of multi-Bernoulli distributions, the desired update follows directly, as given in 
the following corollary. 

Corollary 2. Assume G^ +1 ^ t [h] is as given in Q). Then 

a,b ieTt+i \ jeiit+i 

where T t +i = {1, . . . ,n t+1 }, n t+1 = n t + m t+1 , K t+1 = lit U {(t + 1, G 
{1, . . . , rnt+i}}, the sum over a and b is over all 

he n H\ +1 , be [] T t+ i 

For existing tracks i G {1, . . .,n t }, H\ +1 = HI U {(a 1 U {(t + 1, j)})|a* G Hj, j € 
{1, . . . ,m t+ i}}; for each a 1 = a 1 G W t , Wt+ x \ t+l = ^+i] f+ i an d G t+i\ t +i[h] = 



G t+i\t+x^ f rom and f° r each a 1 = {a 1 U {(t + G U\ +li 

and G\l nt+l [h] = Gj^JJ* [ft] /rem 



i6) 



For Kew traces z = n t + j, j G {1, . . . ,m t+ i} ; = {(0,{(i + 1, j)})} 

- 1 <7 iJ Til - 1 v M( t+1 >M - w * 3 and <7 i,{(t+1 j)} \h\ - 

w t+l\t+i ~ t+l|t+l L J — ' t+l|i+l — "-'t+l|i+l u,t " °t+l|t+l L'l ~ 



Gf +1 | t+1 [ft]i«m (13). 
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2.3 Initialisation 



Initialisation can take advantage of whatever prior information is available. The 
natural choice is to set X£, (x) to be the steady state intensity of undetected 
targets in the absence of sensor measurements, i.e., the X u (x) which satisfies 

A u (a;) = \ b (x) + J f t+llt (x\x')P s (x')\ u (x')dx' 

This is the steady state distribution of targets induced by target birth (with 
intensity X h (x)), movement (with the transition PDF ft+i\t( x \ x ')) an d death 
(via the survival probability P B (x)). If these quantities are modelled sufficiently 
accurately, this distribution should provide a faithful representation of the ex- 
pected number of targets present in the scene. Since there is no detection history 
and hence no tracks, we set 7o = and TZq = 0. 



2.4 Representation of undetected target distribution 

So far we have said little about the distribution of undetected targets, as defined 
by and ([To). There is little prior work in which this has not been assumed 



to be homogeneous. We propose discretising the state space in order to rep- 



resent this distribution via a grid filter. Grid-based representations (e.g., 15 ) 
have fallen out of vogue for most estimation applications as they are generally 
inefficient. In particular, in any problem where the probability distributions are 
peaked, grid representations result in large computing resources being spent on 
the vast majority of cells with near-zero probability density. In comparison, the 
preferred sample-based methods [16] allow computing resources to be focussed 
on the region with significant probability mass. However, in the present appli- 
cation, the Poisson intensity function of undetected targets is, by nature, both 
diffuse and smooth. Thus a grid-based representation is efficient (since most grid 
points will have comparable intensity), and, furthermore, a coarse discretisation 
suffices, such that the grid-based method is tractable. Conversely, sample-based 
methods using any reasonable number of samples would be likely to exhibit poor 
sample coverage in the vicinity of some cells. 



2.5 Comments and relationship to existing work 

At this point, it may seem that little has been achieved other than an alternative 
derivation of the full Bayes RFS filter. The derivation closely parallels [2], with 
the addition of a PPP component of undetected targets, which was suggested 
previously in Rj]. The major purpose of the derivation was to incorporate the 
somewhat obscure form of Q. In Section [3] we show how this form permits 
application of the method described in [4|5] , which is the major contribution of 
this paper. Graphical models, upon which our approximation is built, were first 



applied to tracking in 17J- 19], which focused on distributed sensor networks. 

Because of the hybrid (PPP, MeMBer) representation, there are interesting 
relationships between the proposed algorithm and both the PHD and MeMBer 



13 



filters. The prediction equation for the PPP component ^ is exactly the PHD 
prediction equation, and the update equation for the PPP component ( flO| ) is the 
PHD update for the case in which there are no measurements. In Section [6] we 
propose a concept of recycling, which reduces the number of tracks that need to 
be maintained by representing low probability of existence tracks via the PPP 



component. In 14 , Mahler interprets the PPP as being a projection of the 
updated multi-target distribution onto the subspace of PPP distributions. Not 
surprisingly then, if we project all tracks onto the PPP representation, our filter 
reduces to the PHD. The concept of projecting a subset of tracks onto the PPP 
representation permits us to focus computational resources on tracks with a 
non-negligible probability of existence while still being able to gradually accrue 
confidence in low SNR tracks (e.g., in low probability of detection conditions). 
We expect that this structure will be most useful in track before detect (TkBD) 
problems; extending our derivation to this case is a topic of future work. 

The relationship between the previously detected track portion of the pro- 
posed algorithm and the MeMBer filter is also interesting; this relationship is 
very similar to that between M-C and MeMBer, and is explored in Fig. [I] The 
figure shows the hypothesis tree structure for the tracks maintained by the pro- 
posed method and the MeMBer. In the proposed method, at each time each 
existing leaf hypothesis is updated with each new measurement to form new 
leaves. Additionally, new tracks are formed representing the hypothesis that 
the measurement is a newly detected target. Consistency constraints are en- 
forced between trees in the calculation of approximate marginal probabilities 
via the method described in Section [3j In the MeMBer, each existing track is 
retained, updating each leaf assuming a missed detection. An additional track 
is formed for each new measurement, incorporating hypotheses that the mea- 
surement is a new target, or an extension of each previous track. Each track is 
treated as being independent, i.e., no consistency constraints are enforced. 

The probabilistic structure implied by the hypothesis trees is such that one 
may choose one leaf from each treeQ in any association event. The MeMBer 
places priority on the constraints that each measurement in the latest scan 
can correspond to at most one target, and re-forms the hypothesis trees such 
that these constraints are implicit in the tree structure (i.e., all hypotheses 
updated with the same measurement in the latest scan are in the same track). 
Consequently, continuity of association history is lost; e.g., association events 
in which the same historical track is updated with two different measurements 
are legal, since these reside in different tracks. Conversely, in the proposed 
method (and M-C), continuity of tracks is maintained, but there are leaves of 
different trees corresponding to the different tracks being updated with the same 
measurement. In problems involving closely spaced targets, these constraints 
cannot be ignored, which brings about the complexity of the proposed method, 
M-C and MHT, as well as the problem of coalescence. Our solution to the 
problem of computational complexity is detailed in Section [3j and our solution 
to the problem of coalescence is described in Section [4] 

7 i.e., one association history hypothesis for each track 
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(a) Proposed approximation, Morelande- Challa 
Track 1 Track 2 



Tracks 3, 4 




Tracks 5, 6 
• O 



(b) MeMBer 



Tracks 1, 2 
Scan 1 • O 

Tracks 3, 4 
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Scan 2 



o o 



Legend 

Missed detection/new target O 
Poisson approximation O 
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Q Q 999 999 999 999 99 999 999 99 

2nd scan 
2 measurements 



Scan 3 9 9 




3rd scan 

2 measurements 



O O 



• O 



• O 



9 9 9 9 



Figure 1: Hypothesis trees for (a) proposed method (same as that of M-C), and 
(b) MeMBer. 



One significant advantage of the structure proposed is that, in the case of 
well-separated targets, it reduces to a series of parallel JoTT filters, which is 
optimal in these conditions. In contrast, the MeMBer filter still splits the tracks 
into hypotheses that are not updated, and tracks that are updated with each 
measurement. As described in the derivation of the MeMBer [l] p668, pp678- 
681] , the impact of this approximation will be low if the probability of detection 
is high and the false alarm probability is low (since the legacy track will have 
low weight), but it will be significant in medium to low probability of detection 
conditions. 

Two variations of the JoTT filter to address cases with association conflicts 
were described in [20] , The first uses a linear multi-target approximation (e.g., 
|21| ), which, for each track, approximates all other tracks as being represented 
as a Poisson intensity. The other is effectively a global nearest neighbours filter, 
modified to incorporate target existence. Our method is closely related to these, 
and performance will be compared in future experiments. 

Finally, we note that it is possible to use our approximate method to enforce 
constraints from previous measurement scans, rather than the most recent scan. 
This leads to an alternative approximation that uses the hypothesis structure 
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from the MeMBer, but uses the graphical model-based method to approximate 
the weights within each track. This method is described in Section [5j We also 
note that the original MeMBer filter in [l] has been shown to exhibit cardinality 
bias, which was subsequently addressed in |22| . 

3 Belief propagation marginal track filter 
(BPMTF) 

Consider the form of and observe that it can be written equivalently as: 

a a i£T 

An obvious simplification is to approximate the joint association event distri- 
bution pt\t(a) by the product of its marginal distributions YiieTt Pt\t( aI ) ■ This 
leads directly leads directly to an approximation of the multi-target distribution: 

G P tl M = Ii G MtW (18) 

ieTt 



or equivalently 



where 



4w=En^(^«) (1 9 ) 



Gt lt [h]= E pW)Glf[h] (20) 

P At {X a(l] )= PW)fiif(X<*(i)) (21) 

This is precisely the multi-Bernoulli form of [lj p 656] , and the use of marginal as- 
sociation probabilities is equivalent to that of [2] . The difficulty in implementing 
this approximation is in the calculation of the marginal association probabilities 
p l ^ t (a l ); in general this calculation requires summation over all joint association 
events. 

This section discusses the framework of graphical models; in Section |3.4| 
we show how we can employ emerging approximate methods from within this 
framework to approximate these marginal probabilities. A significant amount of 
background material is included to provide a level of confidence that the method 
is based on a principled, convex optimisation-based approximation to the exact 
quantities. 

We refer to the filter of the above form, replacing the exact marginal dis- 
tributions with their approximations calculated using belief propagation, as the 
belief propagation marginal track filter (BPMTF). 
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Figure 2: Section of a graphical model depiction of an HMM. The hidden state 
at time t is Xf, and the observation at time t is z t . One choice of factors would be 
ipx (xo) = p(x ), ip( Xt _ ltXt )(x t -i,x t ) =p(x t \x t - 1 ) and ip( XuZt )(x t ,z t ) =p(zt\x t ). 



3.1 Graphical models and belief propagation 



Belief propagation (BP) [23 -25 is a generalisation of common inference algo- 



rithms on Markov chains such as forwards-backwards 25 ■ 27 in discrete hid- 
den Markov models (HMMs) and the Kalman filteiQ |29|30| in Gauss-Markov 
chains. Systems of random variables are represented as an undirected graph 
(<?,£), where the vertices of the graph v £V represent random variables, and 
the edges e G £ C V x V represent probabilistic dependencies between random 
variables]^] For example, the graphical depiction of an HMM is shown in Fig. [2j 
For clarity, we restrict ourselves to pairwise graphical models, although this 
is not required^ The joint probability distribution of the variables x — (x v ) ve \> 
can be written as: 

"1,1)2) V "1 ' "2 I 

(22) 

vev (v!,v 2 )e£ 

where ipv(') and ip(v 1 .v 2 )('j ') are the factors that collectively make up the distri- 
bution. A graph encodes the conditionally independence structure in a network 
of random variables; e.g., if all paths from vertices A C V to B C V pass through 
the vertex set C C V, then A and B are conditionally independent given C. In 
the case of Fig. [2] this simply corresponds to the Markovianity property, and 
the property that the observation at a given time is conditionally independent 
of all other states and observations given the state at the same time. 

If a graph is acyclic J^] then inference may be performed exactly by iterating 

8 More precisely, the fixed interval Kalman smoother 281. 

9 Since we are de aling with undirected graphs, if (vi, V2 ) £ £, then (v2,vi) £ £. 
10 As described in [3l| p289], any graphical model can be converted to a pairwise model via 
a simple transformation; factor graphs [32| provide a systematic mechanism which uses that 
transformation. Alternatively, the distribution may be written as the product of factors on 
maximal cliques. 

i.e., there is at most one path between any two vertices. 
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the following equation for all (vi,V2) S £ 23 25|31 



fj, v 



f^ii2|('«,fi)e£ 



(23) 

The quantities /Lt„ 1 _> W2 (-) are referred to as messages, since they are exchanged 
between neighbouring vertices. Upon convergence (i.e., when subsequent iter- 
ates of all messages are identical) , the marginal distribution of any vertex oeV 
may be recovered as: 



P(X V ) CX 



t;'|('U ) i u / )£f 



(24) 



and the pairwise distribution of any edge (vi : V2) € £ may be recovered as: 



P\3*Vl 1 %V2 ) ^ V^l )'0i>2 (*^^2 ~)^P{vi ,V2) i&V\ •> %V2 ) ' 



n 



n 



/•V 



(25) 



For obvious reasons, this is also referred to as the sum-product algorithm; as 
previously stated, it may be seen to be a generalisation of the fixed interval 



Kalman smoother 28 , and of the forwards-backwards algorithm for inference in 
HMMs 25 - 27 . The related max-product algorithm replaces the integral in ( 23 1 



with a maximisation operation, providing a generalisation of the Viterbi algo- 
rithm 1 25 27|33 to trees just as sum-product generalises the forwards-backwards 
algorithm. Algebraically, this simply replaces the (+, •) semi-ring with the 
(max, •) or, working in negative log-space, the (min, +) semi-ring. 

Inference on cyclic graphs can be conducted optimally using a similar al- 
gorithm on a modified graph referred to as a junction tree 24|25 . Loosely 
speaking, the concept is to merge vertices together into hyper-vertices, until 
one obtains a graph which is a tree. In the discrete case, the complexity of this 
algorithm increases exponentially with the maximum hyper-vertex size. 



3.2 Loopy belief propagation 

While the algorithm above was derived for and is optimal for acyclic graphs, it 
can also be applied directly to cyclic graphs; this is referred to as loopy belief 
propagation (LBP). In the case of cyclic graphs, LBP is neither guaranteed to 
converge to the optimal result, nor to converge at all. Nevertheless, and perhaps 
surprisingly, it has exhibited excellent empirical performance in may practical 
problems 34 . For example, the popular iterative turbo decoding algorithm has 



been shown to be an instance of LBP 35 



In discrete (or hybrid) cases, an appropriate counting (hybrid) measure is used. 
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The common wisdom is that the performance of LBP will be acceptable when 
there are few cycles, and when the coupling strength (i.e., the degree of cor- 
relation/dependency between variables) in the cycles is weak [34]. Conversely, 
when there are many cycles involving strong coupling, performance is expected 
to be problematic. In recent years, however, there have emerged some dense, 
loopy structures for which performance is surprisingly good. The remarkable 
result that max-product LBP applied to a complete bipartite graph optimally 
solves the two dimensional assignment problem in time comparable to that of 
the auction algorithm was proven in |36|. Since then, authors from machine 



learning, statistical physics, information theory and tracking 4|5|37 -40 have 
studied the properties of sum-product LBP as an approximation to the related 
^P-complete problem of calculating the matrix permanent. A by-product of 
this is algorithm is an approximation of the marginal association probabilities 
that are required for JPDA and ( 21 ). The accuracy of the approximate marginal 
association probabilities was studied in |3] , and convergence was proven in [5] . A 
more general convergence proof was obtained via a different route in the parallel 
40|4T 



The latter work also showed that the Bethe free energy is a convex 



work 

lower bound to the Gibbs free energy for the problem of interest. Section [373 



develops the theory required to understand this statement; Section [3. 4| presents 
the model to which these results apply, and the results themselves are discussed 
in Section [3.4.31 



3.3 Gibbs free energy, Bethe free energy 

The problem of finding the marginal distributions of vertices (random variables) 
in a graphical model can be posed as a variational problem, i.e., as an optimi- 
sation. The common optimisation is one involving the Kullback-Leibler (KL) 
divergence 42|43 : 

minimise / q(x) log — -^dx (26) 
i J P(x) 

It is well-known that (e.g., 42 ) in the above problem, the unique minimum 
value of zero is attained by settingp^j q(x) = p(x). Manipulating the objective, 
we obtain: 

J(q)= I q(x)\og^\dx 



p(x) 

= - I q(x)logp(x)dx - H(q) 



h (q) _ E / qv(xv)\ogip v (x v )dx v 

Q(Vl ,V2) [P^Vx 1 ) tP(Vx ,^2) V^Vx 1 %V2 )^-^Vx ^-%V2 (^^) 



E 



3 Other than on a set of zero measure. 
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where q v (-) and qr Vl , V2 )(-, •) are the marginal distributions of the random vari- 
ables v and (vi,V2) respectively. This objective can be shown to be equivalent 



to the Gibbs free energy, developed in statistical physics 44 



The only appearance of the full distribution q(-) in (27) is in the entropy 
term H(q); otherwise only the vertex marginal distributions q v (-) and the pair- 
wise edge marginals qt vllV2 )(-, •) appear. Variational approximation methods 
approach large-scale inference problems for which exact calculations are in- 
tractable by approximating the elements of the underlying variational problem, 
i.e., the objective function and the feasible set 31 Section 4]. In the objec- 
tive function in p7| , this corresponds to replacing the entropy H{q) with an 
approximation that involves only low order marginal distributions {e.g., vertex 
and edge marginals). One can then perform the optimisation in (26) directly 



in terms of these low-order marginals, subject to constraints which ensure that 
the collection of marginal distributions could feasibly arise from some valid joint 
distribution. 

Many variational approximation methods have been proposed including 

31p5 



here 



mean field, belief propagation and tree reweighted sum product 
we focus on loopy belief propagation (LBP). A clos e relat ionship between 
LBP and the Bethe free energjj^] was established in 47|48 . Specifically, a 
problem was studied that was parameterised via vertex and edge marginals, 
the feasibility constraint was relaxed to a local consistency constraint, i.e., 



Q(vi ,v 2 ) {xv± 7 x V2 )dx Vl — q V2 (x V2 )V(v u v 2 )€E,Vx V2 (28) 
and the objective was replaced with the Bethe free energy p^] 
J(q) = -^2 J Qv(xv)\ogi) v (x v )dx v 



E 

(vi,v 2 )e£ 



I(Q(vuv 2 )) (29) 



where 



H(q v ) = - J q v {x v )logq v (x v )dx v 

f f q^V V ) (>Xy Xy ) 

-^( ( Z(«i,t>2)) // 9(v i,v 2 ) {x Vl j x V2 ) log — -. ;— -. rdx Vl dx V2 

J J qv% yx Vl )qv 2 {x V2 ) 

The Karush-Kuhn- Tucker (KKT) condition^] for this problem provide a series 
of simultaneous nonlinear equations in the various node and edge marginals. 

14 Named after work by Sir Hans Bethe, Nobel laureate in physics, |46j . 
lj We use the equivalent form from [31 p83]. 

16 The KKT conditions are necessary conditions for optimality in constrained optimisation 
problems (and sufficient conditions in convex problems satisfying minor technical conditions) 
[49l. 
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Manipulating these equations gives rise to the sum-product LBP equations ( 23 ) 



(24), (25); the messages fi Vl ^ V2 (-) correspond to Lagrange multipliers for the 



various local consistency constraints 31 . Applying the LBP equations iter 



atively will attempt to solve the nonlinear equality system using the general 
iterative method. 

Whereas the Gibbs free energy is a convex function of the distribution q(x), 
the Bethe free energy is generally a non-convex function of the various vertex 
and edge marginal distributions. This is one of the major causes of the conver- 
gence difficulties that LBP experiences in some problems. However, recently it 
has been shown that in the problem of interest, the Bethe free energy is con- 
vex with respect to a particular parameterisation 40|41 ; this result leads to a 



convergence guarantee, and some intuition as to why the marginal probability 
estimates appear to be of high qualityp^] 

3.4 Approximating marginal association probabilities 

Returning now to our problem of interest, we seek marginal distributions of a 1 
and b 3 where 

p(a,6)«n ("" II '•;;••.:>/'./;•,) (30) 



Through ( 21 ), these in turn will provide us with Bernoulli multi-target marginal 
distributions of the various tracks that summarise the distribution of previously 
detected targets. 

First, we restrict ourselves to the case in which all hypotheses a 1 consist 
of either zero or one measurements; this covers the single time step case. The 
graphical model for this case is shown in Fig. [3j The sum-product messages in 
this case are: [5] 

^a.K) = £^' J (a\^) LI /V-W^') 



3.4.1 Simplified message equations 



In 36 , the MAP version of this problem was studied, and simplified yet equiv- 
alent messages were provided. Simplifications for the sum-product case were 
derived similarly in 5|37 . The simplifications exploit two facts: 

• Each matrix i/jp J (a l , b 3 ) contains only two unique rows and two unique 
columns 



1 7 Empirical evidence suggests that LBP provides good quality estimates when it converges, 
e.g., [34]. 



21 



Figure 3: Graphical model depiction of for single scan data association problem. 
Left-hand variables a 1 hypothesise the measurement with which target i is asso- 
ciated; right-hand variables b 3 hypothesise the target with which measurement 
j is associated. 

• We are free to renormalise the messages /V-^C^) an d ^iy_). «(a*) 
Consequently each message can be parameterised by a scalar: 



1, otherwise 

/i 6 i^ i, a 1 = {j} 
1, otherwise 



In terms of these scalars, the simplified message updates become: 

w h{o} 

Mtf->-a< = (32) 

where we use the convention that w l,a = for a ^ W . If the false alarm density 
is non-zero and the probability of detection is less than unity then the denomi- 



nators are guaranteed to be non-zero P°| The non-linear equations (31 ) and (32 1 
can be iterated until convergence is obtained; Matlab code for a modified version 
of this process was provided in [5]. Upon convergence, approximate marginals 
can be recovered as: 

~ i (a i )(X S wifi > a ' i = f331 

p'^k^k, V=ieT (34) 



18 The only exception to this is the case in which there is a measurement that can only be 
explained as being a new target; inference in this case is trivial. 
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3.4.2 Comments on accuracy of approximation 

The accuracy of this approximation was studied in [4]. The experiments stud- 
ied problems involving six targets on a regularly spaced grid, where the grid 
spacing was varied between zero, and a distance such that the targets were non- 
interacting. The worst case average maximum error in the marginal association 
weights was 0.006 when P d = 0.3, and 0.083 when P d = 0.9. Holding P d = 0.9 
and increasing the false alarm rate also showed a small reduction in the aver- 
age maximum error. Thus, in the most challenging problems, the approximate 
weights are closest to the exact values. 

Should the accuracy of the approximations provided by this method be in- 
sufficient, a range of other methods (e.g., 48 , and the emerging result of [41p0] ) 



are available that provide improved accuracy at the expense of increased (but 
still polynomial) computational complexity. 

3.4.3 Convergence guarantees and complexity 



Convergence of LBP in this problem was proven in parallel in 40|41 and [5] 



The proof in [5] assumed a non-unity probability of detection and a non-zero 
probability of false alarm, and showed that this yields a contraction. The results 
40|41 provide some very interesting properties of the Bethe free energy for 



the case of interest, and leverages these to establish convergence in the more 
general case (admitting the case with no false alarms). One result is that the 
Bethe free energy is convex with respect to a particular parameterisation. In 
the variational approximation framework, this establishes that LBP is solving 
a convex approximation to the exact inference problem, in turn providing some 
intuition as to why the observed empirical performance has been surprisingly 
good. 



4 Coalescence avoidance 

Using the algorithm proposed in Section [3j targets that are close to each other 
will compete for measurements, leading to coalescence-like phenomena (e.g., 
[5l]). It is in these same circumstances that convergence of LBP will be at its 



slowest, and the quality of the approximation in Section 3.4 will be at its poor 



est. The problem was addressed in 51 by deleting all but the most likely joint 



hypothesis among groups that are equivalent up to permutation; this requires ex- 
plicit enumeration of joint (i.e., multi-target) hypotheses. The minimum mean 
optimal sub-pattern assignment (MMOSPA) filter in [52] computes the estimate 
that minimises the MOSPA criterion, necessitating a numerical integration in 
joint target state space. Both of these methods are intractable for all but the 
simplest problems. 

At the other end of the scale, MeMBer avoids coalescence by collecting all 
hypotheses that were updated with a particular measurement in the latest scan 
into a single track. While this is computationally tractable, it loses continuity 



23 



of tracks, and the accuracy of the approximations involved are problematic in 
low SNR conditions |1, p668, pp678-681]. 

The proposed method is a hybrid between the previously described RFS 
filter and a hard, MAP assignment method. Practically, coalescence is one of the 
major reasons why MAP methods such as MHT are favoured over alternatives 
such as Gaussian sum filters, so using MAP to resolve coalescence in a marginal 
track filter is reasonable in principle. 

In the single scan case, the problem of finding the MAP hypothesis is easily 



solvable using the auction algorithm 53 . Subsequently, the weight adjustment 



can be made by solving the following network flow problem (which can be per- 
formed efficiently using the network simplex algorithm |11| ). The formulation 
seeks maximise the transfer of association probability to the MAP association 
solution. We denote by p l (a l ) the marginal weight of association event a 1 under 
track i (found approximately via the BPMTF), and by a 1 * the MAP association 
event in the 2D assignment (a" = if the target is missed). The network flow 
formulation is then: 



maximise ^ S i (a 1 * ) (35) 

subject to ^2 Si ( a ) = V i e T 

53$*(a) = Vac \JW 

<*V*)>0 V^eT 

- P\a)q l ' a < <S» < V i e T, a G ^{a**} 

5 i (a)=0 VieT, a<£H l 

5 4 (0)=O VieT 

The final two constraints effectively remove the corresponding variables from the 
optimisation. Upon completion, the marginal probabilities of every hypothesis 
of every track are modified to: 

?(a)=p(a)+5*(a) 
p 1 (a) 

For each track i G T with a 1 * =^ 0, the hypothesis and existence-conditioned 
distribution is modified to: 

n, a *' (x) = P'(^)g^ a "r- a "(^)-E^r,^^' , ( a ")/ I '' a "( a; ) 

p i {a i *)q i > ai * 

For remaining hypotheses, f l ' a (x) — j l ' a {x). Appendix [pjjshows that the update 
rules modify neither the existence probability of each track nor the overall first 
moment of the distribution. 
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This exchange results in a loss of track identity information; the resulting 
distribution of track identity can be maintained through an additional layer 
{e.g., [54]). 

5 BP-MeMBer filter 

The previous sections have described the BPMTF, and the necessary modifica- 
tions for coalescence avoidance. As described in Section [2~5"1 one key difference 
between the proposed method and the MeMBer is the reversal of the hypoth- 
esis branching structure. This section describes an algorithm which we term 
the belief propagation multi-target multi-Bernoulli filter (BP-MeMBer), which 
forms tracks in a similar manner to the MeMBer, but uses the LBP methods 
described in Section [3] to calculate marginal probabilities that are cognisant of 
the constraints from the previous time step. 

Consider the information contained in the two sets of association variables, 
a = (a l )i S r an d b — (6 J ) je 7^. The value of the random variable a 1 is the 
measurement (subset) associated with target i, while the value of V is the index 
of the target associated with measurement j. These two sets are redundant: 
given a one can perfectly reconstruct b, and vice versa. Accordingly, we can 
represent the updated multi-target PGF1 equivalently as 

Gfrfilt+iN = X)Pt+i|t+i(o) II G t+i\t+iW 

a ieTt+i 

or 

<WiM=£*"W 6 > II G Kfm +1|6 "%] 

b i&Tt + i 

The essential difference between the BPMTF and the BP-MeMBer is that of 
approximating the target association distribution as p(a) ~ YiieTP 1 ^ 0,1 ^ ^ n 
the BPMTF), versus approximating the equivalent measurement association 
distribution as p(b) w Ujen^i^) ( in the BP-MeMBer). 

To proceed, assume that the predicted distribution carried forward from the 
previous time step is multi-Bernoulli, i.e., there is a set of tracks Tt, each of 
which consists of a sing lc hypothesis W t+1]t = {0}Q Let n t = \Tt\, and denote 
by Zt+i = {z 1 , . . . , z mt+1 } the set of measurements at time (t+1), so that Tt+i — 
{1, . . . , n t + mt+i}. Let TZ t +i = {1, • • • , m t+1 + n t }; resources {1, . . . , m t +i} 
correspond to the measurements, while resources {m t +i + 1, . . . ,m t +i + n t } 
are pseudo- measurements corresponding to the respective tracks {1, . . . , n t } ex- 
periencing a missed detection. Accordingly, each input track i will result in 
(m t+ i + nt) updated hypotheses, with (n t — 1) of these having zero weight 
(since w^I^t^x" 3 — 0, j ' ^ i, j > by construction), and similarly each new 

19 This single hypothesis may involve a multi-modal distribution, but there are no asso- 
ciation constraints between targets. The hypothesis is denoted as association constraints 
from previous scans are represented by adjusting the conditional track distribution using the 
marginal association probabilities. 
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Track (i 








Existing tracks 


New tracks 


Measurement (j £ TV) 




1 2 3 


4 5 


Measurements 


1 


0.5 0.2 0.1 


0.2 




2 


0.1 0.2 0.6 


0.1 


Pseudo- measurements 


3 


0.4 





(missed detections) 


4 


0.6 







5 


0.3 






Table 1: Example case showing difference between BPMTF and BP-MeMBcr. 
Tracks carried forward in BPMTF correspond to columns of table, whereas 
tracks in BP-McMBer correspond to rows in table. 



target track i £ {n t + 1, . . . , n t + m t +i} will consist of (n t + rrit+x) hypotheses 
where only a single hypothesis has non-zero weight. We can exactly represent 
the updated PGF1 as: 

Applying the alternative approximation p t+ i\ t +i 0) ~ Yljen t +i Pt+yt+i^), we 
obtain 

The BP-MeMBer uses this expression, replacing the marginal probability 
p^ +1 u_l_ 1 (6 J ) with its LBP approximation i>( +1 | t+1 (& 7 ). 

To provide a concrete example, consider a case in which the marginal prob- 
abilities are as detailed in Table [l] Using the BPMTF, one track would be 
generated for each column in the table. Conversely, using the BP-MeMBer, one 
track would be generated for each row in the table. The two methods provide 
alternative approximations of the joint distribution. 



6 Undetected targets and recycling 

Recall again the form that the BPMTF maintains from one time interval to the 
next: 

G tlt [h}<xen>W\t[h}}- U G i\M (36) 

ieTt 

Note that this is a union of independent components: a Poisson point process 
component, and a series of independent multi-target Bernoulli components]^] 
Since a new track must be created for every new measurement received, tracks 
will inevitably need to be deleted from the system. The system designer must 

20 The multi-target Bernoulli components are not naturally independent but they are ap- 
proximated as such, as in M-C |], and the MeMBer pi. 
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trade off system performance against computational complexity in constructing 
this mechanism. 



One possibility which arises from the form of ( 36 1 is to approximate indi- 
vidual tracks as being Poisson, rather than deleting them. Consider a single 
multi-target Bernoulli component: 

fi-<4' x = % 

f tlt {X)=Ui lt fi lt (x), X = {x] 
lo, \X\>1 

We may choose to approximate this component as being Poisson: 
f tlt (X) = exp(-A* |t ) J] Xi ]t (x) 

where X\, t = J X^ t (x)dx, and we define f^ t (x) = AJ| t (x)/AJ| t for later use. The 
distortion caused by this approximation may be measured by the multi-target 
KL divergence: 111 p513] 

D(fi {t (X)\\fi lt (X))= [ fi lt (X)log f Jf^SX 

= (i _ g j ) bg \ q l\ + [ 4 lt fU x ) log q W X \ dx 

g exp(-AJ |t ) J*\*W> 8 JAJ,^*) 

= (1 - ?j| t )log(l " ?t|t) + C 1 - 4t) A t|t + 9*|tPogffj|* + A ^ - lo S A *| t ] 
+ 4^/^)11/^)) 

It is optimal to set fl\ t {x) = fl t (x) and AJ. 4 = gji t , so that X l t ^(x) — <f t \ t fl\ t {x) 
(as shown in \U p579]). The value of the KL divergence at this optimal choice 
is: 

D(fi lt (X)\\fi ]t (X)) = ql lt + (1 - <4)log(l - qi ]t ) 

The value of this KL divergence is shown in Fig. [4] as a function of the tar- 
get existence qi r The figure demonstrates that the distortion caused by the 
approximation (as measured by the KL divergence) is very small for existence 
probabilities less than 0.1. 

In Appendix [O we show that the KL divergence between the overall multi- 



target distribution comprised of independent components (e.g., (36)) and a 
modified multi-target distribution in which approximations have been made 
to a number of these components is bounded by the sum of the KL divergences 
between the components and their respective approximations. Thus the over- 
all distortion we cause to the complete multi-target distribution is bounded by 
the sum of the individual track distortions, which depends only on the track 
existence probability. Accordingly, we may approximate the tracks with lowest 
existence probabilities such that the sum of the distortions is less than an overall 
distortion budget. 
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KL divergence between Bernoulli and Poisson 




Existence probability {q\i t ) 



Figure 4: Multi-target KL divergence between multi-target Bernoulli distribu- 
tion and best-fit Poisson distribution as a function of target existence probability 

When any number of Bernoulli tracks are approximated as being Poisson, the 
resulting multi-target distribution is equivalent to one in which those tracks are 
dropped, and their intensity is added onto the undetected target distribution. To 
confirm this, denote the subset of tracks that we retain as %; the approximated 
distribution is then 

G t \ t [h] oc exp{X^ t [h}} ■ ]J & t[t [h] ■ JJ G\ t [h] 
ieT t \tt iett 

cxexp{A^ [/*]}• J] exp{Xi lt [h]}-l[Gi lt [h} 

ieT t \tt iett 

= expl\? lt [h}+ £ \\ lt [h] I • J[ Gi {t [h} 
{ ieT t \tt ) ie% 

= exp{A^[/ l ]}- Y[Gi [t [h] 

iett 

where 

\% t [h] = K\Ah}+ E A *I*W 
ieTt\tt 

We refer to this concept as recycling, since the tracks that we delete re-enter 
the system through the undetected target intensity. Not surprisingly, it can be 
shown that if the prior distribution is purely Poisson (i.e., there are no pre- 
existing target tracks) and we choose to recycle all posterior tracks, then the 
posterior Poisson distribution is equivalent to that obtained using the PHD. 
By recycling a subset of tracks, we permit the large mass of tracks with low 
probability of existence to be represented efficiently by the Poisson distribution, 
while maintaining explicit Bernoulli tracks on the subset with non-negligible 
probability of detection. Representing low probability of existence tracks via 



the Poisson distribution reduces the computational complexity due to data as- 
sociation. Furthermore, if the Poisson distribution is represented as a discrete 
grid, then there is no computational cost associated with representing additional 
tracks. 

In practice, this approximation allows the system to gradually accrue confi- 
dence in the presence of a target before choosing to maintain an explicit track. 
As shown in equation (13), the existence probability is proportional to the un- 
detected target intensity (in the vicinity of the measurement) divided by the 
undetected target intensity plus the false alarm intensity, so the intensity added 
by recycling will cause the existence probability of a new track due to a later 
measurement in the same vicinity to be increased, reducing the likelihood that 
the track will again be recycled. This structure is likely to be most advantageous 
in very low SNR environments, where there is a large number of false alarms, 
and confidence in target existence must be accrued over a significant time. 



7 Conclusion 

We have shown that, under common assumptions, the full RFS distribution 
can be represented via a structured form, which admits application of graphical 
model inference methods. A heuristic, computationally tractable method was 
presented for addressing the problem of coalescence in multiple target track- 
ing problems. We described the benefit of maintaining a discretised Poisson 
representation of undetected targets alongside the conventional representation, 
introducing the concept of target recycling, which maintains the first moment 
of the distribution upon deletion of tracks. The next step in this research is 
to compare the proposed methods to existing filters such as the [C]PHD and 
MeMBer. 



A Linearity of probability generating functional 

In general, linearity of a probability generating functional is straight-forward, 
following directly from linearity of the integration operator. This section con- 
firms that this remains the case in the random finite set case, in which the 
integral is a set integral. The required property is that: 

Lemma 2. The RFS probability generating functional is linear, i.e., 

J h x [af 1 (X)+f3f 2 (X)]5X = a J h x h(X)SX + J h x f 2 (X)SX 
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Proof. This follows simply since: 
h x [af 1 (X)+pf 2 (X)]5X 



[a/i(0)+/3/ 2 (0)] + 



E 



xf\{{xi, . . .,x n }) + (3f 2 ({x!, . . . ,x n })] dxi, • • • ,dx n 



OO _ " 71 

a(0)+e / n^) 

-^|/ 2 (0)+£ | 



/i({xi, . . . ,x n })dx!, ■ ■ ■ ,dx n > + 



Y[h( Xi ) 



f 2 ({x 1 ,...,x n })dx 1 ,--- ,dx r , 



a / h*h{X)5X + p / h x f 2 (X)6X 



□ 



Note that this is indeed a general property of set integrals, applying not only 
to the PGF1 transformation. 



B Properties of coalescence avoidance 
algorithm 

This section establishes two properties that are maintained by the coalescence 
avoidance algorithm in Section [3] First, we show that the probability of exis- 
tence of each track is unmodified. 

Lemma 3. For each track i € T, 

aEH* new 

Proof. 

since YlaeW ^( a ) = 0- ^ 

Lemma 4. The network flow coalescence updates in Section [^] do not change 
the first moment of the distribution, i.e., 

J2 E p l (aW' a f ha (x) = E E fW' a f' a (x) 

i€T oeW teT new.* 
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Proof. 



= E 

ieT 

= E 

ieT 



a£« i \{a"} 

p f (o <, ) 9 < ' a "/ i ' a "(s)- £ <fV*).f V >) + 



+ ^ [f(a)q l - a + S\a)}P a (x) 
oew s \{o**} 

= E £ PW' a f ' a (*)+ 



+ £ 

ieT 



aeH'\{a") i'eT\{i} 



Consider a term in the final line, ieT and a € H % \{a % *}. Suppose that $ i' 
such that </* = a. Then f(a) = V i since <P(a) < V i and £\ <?( a ) = 0. 
Now suppose that there is a unique i' with a' 1 * = a. In this case, the negative 
term in the second sum on the final line will cancel the term in the first sum. 
Finally, suppose that there are multiple i' with a 1 * — a. By the constraints on 
the MAP solution, the only way this can happen is if a = 0, so that 6 l (a) = 0. 
Thus the final line is zero. □ 



C Sub-additivity of KL divergence 

The following theorem establishes a result that the KL divergence between a 
multi-target distribution and the distribution that results from approximations 
made to a series of independent subcomponents of the distribution is bounded by 
the sum of the KL divergences between the subcomponents and their respective 
approximations. We prove the result for two independent subcomponents; the 
general case results simply from induction. 

Theorem 1. Let 

f(x)= g(w)h(x - w) 

wcx 

and 

f(X) = J2 9(W)h{X - W) 
wcx 

Then 

D(f\\f) < D(g\\~g) + D(h\\h) 
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Proof. 

J2wcx9(W)h(X-W) 



D(f\\f) = 



J2 g{W)h(x~w) 



wcx 



log i 



Ewcx~g(w)h(x-w) 

J ^ x g(W)h(X-W) 

= [ E 9(W)h(X-W)lo g 9 ^±5X 
J wcx yy ' 



^D^) + £>(/#) 

where (a) is a consequence of the log-sum inequality, [42j p29], and (b) is 
result of Lemma [5l 



Lemma 5. 



f J2 a(W)b(X - W)5X = f a(W)SW- f b{Y)SY 



wcx 



Proof. For simplicity, let f ■•• J a({xi, . . . , x n })dxi ■ ■ ■ dx n = a(0) when n 
Then: 

/ a(W)b(X-W)SX 

J WCX 

E a(W)6({a:i,...,a:n}-W r )da: 1 -..da! T . 

n=0 n ' ^ ^ WC{xi,...,x n } 

= Yj E — j j a({x i \i€l})b({xi\i<£l})dx 1 ---dx n 

n=0/C{l,...,n} U ' J J 

EEl- i7 ^ '/""•/ a ^ Xl ' • ■ ■ ' z™})^! ' ' ' da; ™- 

z — ' * — ' «' m!(n — m)! J J 

b({x m+ i, x n })dx m+ i ■ ■ ■ dx n 
EE^ / " / a ({ x i' • • •!^m})da>i • ••dx m - 
. . . ,xi})dx! ■■■dxt 
a{W)5W ■ I b(Y)SY 



(a) 



OO OO 

(b) 



m=0 1=0 
1 
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where (a) results from the observation that terms in the sum over / with the 
same number of elements in I will have the same value, and (b) results from 
letting I = (n — m) and reordering terms in the two-dimensional summation. □ 
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